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Data from the social-media site, Twitter, is used to study the fluctuations in tweet 
rates of brand names. The tweet rates are the result of a strongly correlated user be¬ 
havior, which leads to bursty collective dynamics with a characteristic 1 /f noise. Here 
we use the aggregated ’’user interest” in a brand name to model collective human 
dynamics by a stochastic differential equation with multiplicative noise. The model 
is supported by a detailed analysis of the tweet rate fluctuations and it reproduces 
both the exact bursty dynamics found in the data and the 1/f noise. 



2 


I. INTRODUCTION 

In the online era, humans are connected in real time on global scales. Local or seemingly 
local information is instantaneously shared across geographical boundaries. In particular, 
social online media have become an important platform for the sharing of information and 
have allowed for detailed studies of the coherent behavior of humans on a global scale 
PHS]. The popular microblogging platform Twitter is a good source for such studies for 
two reasons. First, Twitter is more about providing news updates than developing social 
networks mi- User behavior is therefore to a large extent influenced by information 
available via other information channels in society. Secondly, users respond to available 
information by submitting short public messages, ’’tweets”, of up to 140 characters that 
may be seen as proxies for the public interest. Recent research on Twitter has used the 
activity levels in forecasting real-world events including fluctuations of stock market prices 
[9j, real-time detection of the location and spread of earthquakes hitting populated areas 
[TO] , and for sentiment analysis and opinion mining HU 

In a recent paper [4], fluctuations in the tweet rates of 92 brand names are shown to be 
distributed with a power law tail with an exponent of —2.9 ± 0.4(SD). The broad tail of the 
distribution is characteristic for bursty activity levels. It is moreover found that the power 
spectral density of the tweet rate signals are described by a power law with an exponent of 
— 1.0±0.4(SD). This so called, ”l/f noise”, is found in a range of complex systems including 
heartbeats (G2), DNA base sequences ([13]) an d condensed matter systems am and it 
is interpreted as a sign of a pronounced memory in the systems (HU)- We attribute the 
power spectral density and the broad distribution in the tweet rate fluctuations to a strong 
correlation on a global scale in the collective human dynamics. 

In this paper we consider the global user interest in a brand, which in our definition is the 
likelihood for a tweet to mention the brand name. The global interest in a topic is expected 
to change in a continuous and random fashion as the result of many independent events 
in society. We shall therefore describe the global user interest by a stochastic differential 
equation (SDE), which we derive by analyzing the fluctuations in the tweet rate. The SDE 
predicts simultaneously the power law exponents of the tweet rate distribution and the 
strong memory in the temporal variation of the tweet rates. 

The paper is organized as follows, first we briefly describe the data acquired from Twitter 
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and explain how the data is turned into a tweet rate. Then we introduce a method for 
analyzing the fluctuations in the the tweet rate and demonstrate how it works on a generic 
signal. Our method is applied to data and supports an SDE with multiplicative noise. 
Finally, we show that the noise term in the SDE reproduces the power law distribution of 
the tweet rate as well as the power spectral density of the temporal signal. 



FIG. 1. Tweet rate signals We show the number of tweets measured in a time window, AT = 
lOrnin, for a few brands. Note the regular daily variation and the irregular bursty behavior. 


II. METHODS 

A. Data collection and time signals 

We used the public REST API by Twitter to collect tweets containing one of seven brand 
names during a time period in the fall 2012 and the spring 2014 (see Supporting Information 
sections SI and S2 for data and a description of time periods). The brand names considered 
are ”Samsung”, ”Pepsi”, ”Heineken”, ”Gucci”, ”Starbucks”, "BMW”, and ”Google”. In 
the analysis, we chose to use international brand names for a number of reasons. First, 
the brand names are used globally and the users posting tweets about the brands in general 
transcend local communities. Secondly, the brands are sufficiently popular that a continuous 
and robust stream of tweets exists. 

From the tweets collected, we save the time t * where a tweet is posted. The index i refers 
to the identification number of a given tweet. From the individual tweets, we form a time 
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signal by summing over all tweets mentioning a given brand, 

s(t) = 

i 

where S(t — tf) is the Dirac delta function. The time signal is turned into a tweet rate, 
x(t), by dividing the time axis into windows of length, AT, and summing the events in each 
window, 

rt+AT/2 

x{t ) = / s{t')dt'. ( 1 ) 

J t—AT/2 

A plot of tweet rate signals is shown for a few brands in Fig. [lj where both a regular daily 
variation and an irregular bursty behavior on top are distinctly visible. Burstiness is known 
to be inherent to individual human dynamics [TEj and to have an impact on information 
spreading HZ]. Here we see that bursts also appear in the aggregated interest level of many 
users in a large-scale social organization. 



FIG. 2. Interpretation of the update formula in Eq. ([2]). If the signal at some point takes 
the value 70 , then a small time step, dt, later, its value will be realized from a Gaussian with a 
mean determined by /( 70 ) and a spread determined by 5 ( 70 )- By performing statistics over many 
such realizations we may therefore obtain the drift and diffusion. 


B. The algorithm 

We now introduce a method to uncover the underlying stochastic properties of a time 
signal [18]. The method is based on the assumption that a signal, 7 (f), is generated by a 
stochastic differential equation (SDE) on the form 


dl = fin) dt + g{i)dW. 


( 2 ) 
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Here dW is a random Gaussian variable with mean, (dW) = 0, and variance, ( dW 2 ) = dt. 
The first term in Eq. ([2]) gives the deterministic drift, while the second term gives the random 
diffusion. The differential equation will here be handled using Ito calculus. Note that the 
above equation is assumed to describe the dynamics of the global user interest and not the 
observed tweet rates given in Eq. ([Tj) . Below we shall relate the two quantities. 

It may be shown in the Fokker-Planck formalism [T9] that if the variable takes the value, 
7 o, at time, t, then at some small time step, dt, later, it will be a random variable from 
a Gaussian distribution with mean, q 0 + f('y 0 )dt, and spread, g{^o)dt 1 ^ 2 (see Fig. [ 2 J. It is 
therefore possible to get an estimate of /( 70 ) and < 7 ( 70 ) by binning all the signal values close 
to 70 and then construct the corresponding distribution of signal values one time step later. 
From this distribution one may read off the mean and the spread to get the estimates of 
/( 70 ) and g( 70 ). The procedure is then repeated over the whole range of realized signal 
values in order to estimate the functional forms of /( 7 ) and g( 7 ), respectively. 

In Fig. [3] we show the result of applying the analysis to a signal generated by 

d 1 = (0.47 - 0.437) dt + 0.237 1 ' 5 dIF, (3) 

Comparing the analytical functions with the estimates, we get i? 2 -values of 0.98 and 0.97 
for the drift and diffusion respectively. The functional form of the drift and diffusion used 
here are equivalent to the ones fitted for ’’Samsung” below, and to make the comparison 
complete, we have also used the same signal length, N = 74,646, and time step, dt = 1. 
Note that bins with less than 20 data points have not been included, due to the otherwise 
poor statistics, and therefore 7 only assumes values between 0.26 and 2.35. 


III. RESULTS 

A. Model 

The stochastic differential equation, Eq. (J2]) is formulated in the probability, j(t), for a 
random tweet to mention a specific brand and not in the tweet rate. We call y(i) for the 
’’global user interest”. In fact, the expected number of tweets on a given topic, (x(t)) p , in a 
time window, At, is given by the full number of tweets posted on Twitter within this time 
window, A(t), times the probability for any such tweet to mention the given topic 7 (f), 
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t 




FIG. 3. Application of the algorithm to a generic signal. A time series generated according 
to Eq. ([ 3 ]) is shown in (A) and below we show the analytic drift, (B), and diffusion, (C), along 
with the functional forms estimated by the algorithm. 


(x(t)) P = l(t)A(t). (4) 

Here the expectation value, (-) p , refers to the Poisson weighted average of all the possible 
realizations of the tweet rate. The actual tweet rate signal is one such realization drawn 
from a Poisson distribution 


x{t) = Pois . 


( 5 ) 
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The above equation summarizes the basic structure of our model: the observed signal, x(t), 
is realized from a Poissonian with a mean given by the product between the user interest, 
7 (£), and the activity, A(t). 

Within the activity, A(t), we also include any factors depending on regional differences, 
since the global composition of active users is changing during a daily cycle. We will assume 
that A(t) may be approximated as a deterministic and periodic function of time and that 
7 (t) is reasonably described by the SDE in Eq. (j2]). The goal of the following data analysis 
is to find the functional form of the drift, /(y), and diffusion, g( 7 ), of the SDE. 


B. Data analysis 

It is problematic to apply the algorithm introduced in the methods section to the tweet 
rate signals, x(t), since we only expect it to apply to the underlying tweet probability, y(t). 
We may reduce noise from the Poisson statistics by increasing the time window, At, thereby 
increasing the expected number of tweets and reducing the relative size of the Poisson noise. 
However, if we increase the time window too much, then we enter the domain of the mean 
held theory, where the time resolution is too low to see the dynamics of the 7 (t)-huctuations 
since they for larger times are dampened by the drift term. Also, for a limited time series 
we do not want to lower the time resolution, since it reduces the number of data points 
available for the analysis. In the following, we have chosen a time window of 30 seconds 
giving us approximately 80,000 data points for each brand. Unfortunately, this window size 
does not allow us to ignore the Poisson noise for any of the brands. We do however expect 
the time window to be small enough to resolve the important dynamics of the user interest. 

The second problem that we face by applying the algorithm is the presence of the activity, 
A(t), relating the observed signal, x(t), to the signal of interest, 7 (£). In the following analysis 
we will assume that the activity is a deterministic function of time with a daily period. One 
would naturally expect it to also have a weekly variation along with a variation on slower 
time scales, but here we will be interested in time scales below the resolution of a day, why 
it makes sense to approximate the activity by a daily period. 

We may estimate the daily variation by averaging x(t) over many days to obtain a variable 
that is proportional to the activity 


(x) D {t) = (Pois ( 7 (t)A(t))) D , 

= (7 (t)A(t)) D , 

= (7(0)d (^W)d ) 

= (7 )r^W- ( 6 ) 

Here we introduced two more expectation values: (-) D is the average of repeated measure¬ 
ments at the same time of day and (-) T is the general time average. We have used that 
Poissonians sum to a Poissonian with an expectation value that is the sum of the individual 
expectation values. We have also used that A(t) and 7 (t) are uncorrelated in our model, 
that 7 (t) is independent of absolute time and that A{t) is a periodic function. I 11 practice, 
the average is performed over 20 to 62 days of measurements and by smoothening data to 
a time resolution of 15 minutes. Using the obtained information, we may construct the 
variable 


7 (f) = 


x(t) 

(x) D (tY 

Pois ( 7 (t)A(t)) 


(i) T m ’ 

which is proportional to 7 (t) if one averages out the Poisson noise 


(7) 


<7 (*)>p = 


tW_ 

(i)t 


( 8 ) 


The variable 7 (t) is the closest approximation we get of 7 (t) by our analysis. To see the 
effect of the Poisson statistics on our analysis, we have also generated the signal 


- Poi s((z) D (P)) 

( j “ (x) D (t) ’ 

_ p ois((7)r A(t)) 
(l) T A(t) ' 


(9) 


and applied the algorithm to both 7 (t) and x(t). The variable x(t) is equivalent to 7 (t), but 
with the dynamics of 7 (t) replaced by the mean value (y) T (compare Eqs. Q and g). By 
applying the algorithm to both 7 (£) and x(t), we hope to be able to separate the effect of 
the daily variation and the Poisson statistics from the actual dynamics of 7 (t). 
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In Fig. [I]A we show 7 (t) and the corresponding instance of x(t ) for “Samsung”. Note that 
the Poisson noise of x(t ) is not enough to explain the bursty behavior observed for the tweet 
rates. The resulting drift (Fig. B 3 ) and diffusion (Fig. |4p) terms are estimated using our 
algorithm on the two signals shown below. The drift of 7 (t) has been fitted with a function 
on the form 

/ ( 7 ) = a/ - &/7- ( 10 ) 

A best fit yields the coefficients aj = 0.47 and bf = 0.43. Similarly, the diffusion has been 
fitted using a function on the form 

9 (7) = a 9 + Vf 3/2 ; ( n ) 

with a g = 0.31 and b g = 0.23. The analysis has been performed in dimensionless time, 
t —> t/At, such that dt = 1. We have been unable to estimate the error bars in the presence 
of the Poisson statistics. 

Our algorithm estimates a linear drift term for both the data, 7 (f), and for the synthetic 
signal, x(t). For x(t), we find a coefficient bf = 1, which is expected from a Poisson process. 
The fact that we find bf = 0.43 for 7 (t) shows that the data is more rich than a simple 
homogeneous or a weakly inhomogeneous Poisson process. In other words, the fluctuations 
of 7 (t) are comparable or stronger than the fluctuations generated by the superimposed 
Poisson process. While we cannot quantify the influence of the Poisson process on the 
linear drift, we are confident that 7=1 is the only stationary point of 7 , corresponding to a 
potential, 1 /( 7 ) = ,/(, with just a single minimum. Furthermore, we do not expect 

the drift to depart significantly from a linear form around and above the fix point. A drift 
term of this form limits the signal and allows bursts to be generated by the multiplicative 
diffusion term. 

In the plot showing the diffusion terms, we find that the effect of the Poisson statistics 
is very distinctly visible as a constant background noise. It indeed matches the size of 
the coefficient a g pretty well. We therefore propose that the this first term is due Poisson 
noise and therefore that the underlying variable 7 (t) is described solely by the second term, 

g( 7 ) = W /2 - 

We apply the same analysis to the other brand names and provide in Table [T] the length of 
the fitted data series, At data , the mean tweet rate, (x), the ratio between the maximum and 
minimum of the daily variation, DV = max [(x) D (f)] /min [(x) D (f)] , and the goodness-of-ht 
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Brand 

-^data 

(*> 

DV 

R 2 

Samsung 

74,646 

10.2 

1.8 

0.96 

Google 

81,105 

67.9 

1.8 

0.95 

Gucci 

74,715 

19.3 

8.7 

0.91 

BMW 

79,599 

3.1 

2.4 

0.86 

Heineken 

179,607 

1.5 

3.2 

0.74 

Starbucks 

72,570 

24.0 

5.3 

0.57 

Pepsi 

57,548 

7.2 

4.8 

0.53 


TABLE I. Brand name characteristics For each brand name we show the length of the time 
series, N^ ta , the mean tweet rate, (x), the ratio between the minimum and maximum of the daily 


variation, ”DV”, and the -Revalues for the fit of Eq. 11 to the diffusion terms estimated by the 
algorithm. 


values for the diffusion term, R 2 . We find that the fit captures the observed diffusion well 
for 4 of the 7 brand names, but it performs poorly for the last 3. We believe that this is the 
result of applying the algorithm to a limited time series under the effects of daily variation 
and Poisson noise. As an example of this, we show in Fig. [5] the result of applying the 
algorithm to the tweet rate signal of ” Starbucks”. We see that a Poisson process captures 
most of the fluctuations found in the dynamical signal, i.e. the diffusion terms of 7 and x 
are approximately equal. We therefore conclude that the average interest, (7), the big daily 
variation, A(t), and the Poisson noise is enough to explain most of the signal for ’’Starbucks”. 
This leaves very little room in the analysis to capture the dynamics of 7 (t) (compare with 
Fig.g and may explain the poor performance of the fit. 

In general, however, we find that the analysis of 7 (t) provides support to the hypothesized 
noise exponent of 3/2. We therefore propose that the global user interest is described by 
the following model 


d 7 (t) = f(j)dt + ky 3/2 dW, (12) 

where /(y) is a slowly decreasing drift term derived from a single well potential. We em¬ 
phasize that in order to derive this result, we assume that 7 (t) is described by the stochastic 
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differential equation, Eq. ((2|) , and that A{t) can be approximated by a periodic function 
with a daily period. Finally, our method works best when the Poisson fluctuations are not 
too strong. 

In the next section we show that if the single well potential defining /(y) is approximated 
by an infinite well, we obtain a probability distribution with a power law exponent of -3 
and a power spectrum with a power law exponent of -1. This is in agreement with the 
characteristic behavior of the brand name signals analyzed in [3]. 


DISTRIBUTION AND POWER SPECTRUM FROM MODEL 


To derive the probability distribution and power spectrum for the model proposed in Eq. 


(12) we switch from the Langevin equation to the corresponding Fokker-Planck formulation 


9 t P(l,t) = Si I 


(13) 


Here we have approximated the drift potential by an infinite well. This yields a vanishing 
drift in the region 7 E [y min , 7max] and reflective boundaries at the effective potential walls 
7 min and y max . One finds the stationary distribution 


Ps( 7) = - 3 

ryO 


(14) 


where N is the normalization constant. The same asymptotic power law is found in the case 
of a linear drift, which is promising since it matches the behavior of the data. 


Eq. (13) may be solved using the method of eigenfunctions as explained in [20]. One 
finds that for an intermediate range of frequencies the power spectrum scales as 


W) ~ T 


(15) 


which is also the case for the data. 


The model proposed for the dynamics of interest, Eq. (12), is therefore successful at simul¬ 


taneously explaining the scaling exponents of the signal distribution and the corresponding 
power spectrum. To show the validity of the infinite well approximation, we conclude the 
paper with a simulation of the model with a linear drift 


dy = (1 — 0.1y)dt + y 3//2 dIU, 


(16) 
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An efficient and accurate numerical integration may be performed by considering the inverse 
variable, r = I/ 7 , which may be integrated using the splitting up method [|2TJ. In Fig. [ 6 ] 
we show the distribution and power spectrum for the simulated signal, and we observe that 
the linear drift is consistent with the power laws observed in the data. The exponent of 
the distribution is fitted to ol\ = —2.961 ± 0.002 using the the maximum likelihood routine 
introduced in [5]. The exponent of the power spectrum is found to be 0:2 = — 0.98±0.03 using 
a logarithmic binning and a least squares fit. The corresponding errorbars are estimated by 
bootstrapping. 


CONCLUSION 

In this paper we have studied the dynamics of interest in global brands by analyzing 
tweet rates on the online social media site Twitter. As a result of the correlations in the 
user behavior, the rates are found to be bursty and distributed as a power law with an 
exponent of -3 and have a power spectrum inversely proportional to the frequency. Since 
the global interest in a brand name is the result of many random events, we have proposed 
to model it by a stochastic differential equation with a simple drift and a diffusion like 
term. By analyzing the fluctuations in the tweet rate signals, we find that the diffusion term 
scales like a power with an exponent of 3/2. The derived diffusion term may explain the 
pronounced burstiness and the 1 /f noise observed for the tweet rate signals. 

It remains an open question whether the dynamics observed for the brand names on 
Twitter can also be observed for the occurrence of other keywords or even in other large 
social organizations? Another interesting question, which we have not addressed with our 
model is, what is the detailed behavior of individual humans that leads to correlated behavior 
given by our model? I 11 general, the growing information available on human behavior in 
global-scale social organizations has helped answer parts of these questions and further 
analysis along the lines of this paper might provide a more complete picture. 
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Supporting Information SI (pdf file). Information on the data collection. 


Supporting Information S2 (Compressed gzip file of a tar archive). Timestamp data for 
tweets. 
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FIG. 4. Application of the algorithm to 7 (t) and x(t ) for the brand name ”Samsung”. 

The two signals are shown in A and below we see the drift terms, B, and diffusion terms, C, 


estimated by the algorithm. Also shown are the fits of the functions in Eq. (10) and © to the 
estimated drift and diffusion of y(t). 
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FIG. 5. Estimated diffusion for the signals 7 (t) and x(t) for ”Starbucks”. Note that the 
diffusion estimated for the two signals is almost equal for this brand name, therefore making it 
hard to filter out the effect of the dynamical interest 7 (f) present in the signal of 7 (t). 
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FIG. 6. Plots of the probability density function (A) and power spectrum (B) for a 
simulation of the model in Eq. (|16[). Note that the power law exponents of -1 and -3 match 


those of the data. 












